Add some doc about interface endpoint layout#13281
Conversation
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
tclinkenbeard-oai
left a comment
There was a problem hiding this comment.
Generated by Codex.
What is it trying to do?
Document RPC interface endpoint layout and explain a latent CommitProxyInterface::setThrottledShard routing bug introduced when earlier endpoint fields were deleted.
Is it correct?
Partially. The anchor-plus-offset layout and the active mismatch are real, and all visible CI checks are green. However, the compatibility explanation is unsafe.
Endpoint offsets are a compatibility contract for binaries sharing a compatible protocol version. FlowTransport accepts compatible peers, not only identical builds. The cluster controller serializes ServerDBInfo, workers deserialize it using their local protocol version, and Ratekeeper consumes its CommitProxyInterface to send setThrottledShard. Mixed compatible binaries can therefore reconstruct different offsets for the same anchor.
The multiversion client does not establish identical source trees either: clients are indexed by normalizedVersion(), and compatible protocol changes keep the same connection.
I reviewed the diff and relevant public source paths only. I did not run builds or tests.
Are there bugs?
-
The section saying indices can be repacked within a protocol version, and that
legacyGetConsistentReadVersionhas no compatibility reason to remain, is incorrect. Repacking requires preserving historical slots or making the protocol version incompatible.ProtocolVersions.cmakeexplicitly calls out communication changes. -
The concrete numbers are stale after PR #13275. Current
mainreconstructssetThrottledShardwith offset12, whileinitEndpoints()registers it at vector index10. The mismatch remains real, but the documented13 -> 11fix is both outdated and unsafe for compatible mixed binaries. -
The failure is not entirely silent. An unresolved stream endpoint causes an
WLTOKEN_ENDPOINT_NOT_FOUNDresponse, which records anEndpointNotFoundevent. The hot-shard update is still not delivered.
Are there omissions?
The document should describe the mixed-binary compatibility boundary and recommend reserving historical slots with placeholders. A regression test should verify both local registration alignment and stable endpoint layouts across compatible versions.
Are there better ways of doing things?
Rewrite the compatibility subsection around the actual invariant: adjusted endpoint positions must remain stable for all compatible binaries. Preserve removed slots explicitly, or bump the protocol version incompatibly when intentionally repacking an interface.
Should this CL be LGTMd?
Not yet. The latent bug report is useful, but the compatibility guidance and stale remediation need correction before this should be merged.
… tree, replace with what seems like more accurate info. Also update misc details. Also fix the bugs with earlier refactoring rather than just talk about them.
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
|
CI's failed with and So I recan them. If it's this PR it should probably be deterministic because I flat out changed some RPC interfaces, which I doubt are being used, but maybe they are. However these ctests pass locally: |
Generated in the process of reviewing PR #13275.
Updated to describe techniques to be used across protocol-compatible releases (probably rare to never) and techniques (interface compaction, the agent called it) that can be used across protocol-incompatible releases. Use the latter to address a latent bug from prior code removal on main.